Master equation approach to DNA-breathing in heteropolymer DNA 
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After crossing an initial barrier to break the first base-pair (bp) in double-stranded DNA, the 
disruption of further bps is characterized by free energies between less than one to a few fcsT. This 
causes the opening of intermittent single-stranded bubbles. Their unzipping and zipping dynamics 
can be monitored by single molecule fluorescence or NMR methods. We here establish a dynamic 
description of this DNA-breathing in a heteropolymer DNA in terms of a master equation that 
governs the time evolution of the joint probability distribution for the bubble size and position 
along the sequence. The transfer coefficients are based on the Poland-Scheraga free energy model. 
We derive the autocorrelation function for the bubble dynamics and the associated relaxation time 
spectrum. In particular, we show how one can obtain the probability densities of individual bubble 
lifetimes and of the waiting times between successive bubble events from the master equation. A 
comparison to results of a stochastic Gillespie simulation shows excellent agreement. 

PACS numbers: 05.40.-a,82.37.-j,87.15.-v,02.50.-r 



I. INTRODUCTION 



Even at room temperature the DNA double-helix 
opens up locally to form intermittent flexible single- 
stranded domains, the DNA-bubbles. Their size increases 
from a few broken base-pairs (bps) to a few hundred open 
bps just below the melting temperature XL, until a tran- 
sition occurs to full denaturation DNA stabil- 
ity is effected by combination of the free energies £hb for 
breaking a Watson-Crick hydrogen bond between com- 
plementary AT and GC bases in a single bp, and the ten 
independent stacking free energies e s t for disrupting the 
interactions between a nearest neighbor pair of bps. At 
100 mM NaCl concentration and T = 37°C the hydro- 
gen bonding amounts to £hb = hO/c^T for a single AT 
and 0.2fceT for a GC-bond 0]. The weakest (strongest) 
stacking energies was found to be the TA/AT (GC/CG) 
with free energies e 8t = -0.9k B T (-4.U- B T). Note that 
negative values for the free energies denote stable states. 
The relatively small free energies for base stacking stem 
from the fact that relatively large amounts of binding en- 
thalpy on the one hand, and entropy release on breaking 
the stacking interactions and Watson-Crick bonds on the 
other hand, almost cancel. Bubble initiation, in contrast, 
is characterized by breaking of two stacking interactions 
with the first bp, whose enthalpy cost cannot be bal- 
anced by the two, still strongly confined, liberated bases. 
Thus, the creation of two boundaries between the intact 
double-helix and the bubble nucleus is associated with 
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an activation cost of some 7 to 12 k B T corresponding to 
the Boltzmann factor cr ~ 10~ 3 . . . 10" 5 0,011 0. The 
cooperativity parameter ctq is related to the ring-factor £ 
used in 0| , see below. The high bubble initiation bar- 
rier indeed guarantees the stability of DNA at physiolog- 
ical conditions. Bubble opening occurs predominantly in 
zones rich in the weaker AT bps; with increasing T, they 
spread to regions with progressively higher GC content 
[2,00]. Thermally driven, a DNA-bubblc fluctuates in 
size by relative random motion of the two zipper forks. 
In addition to observations using NMR techniques [Icj . 
this DNA-breathing was recently monitored on the single 
bubble level by fluorescence correlation methods, demon- 
strating a multistate kinetics that reflects the stepwise 
unzipping and zipping of bps. The lifetime of a bubble 
was shown to range up to a few ms \H\ . 

From a biology or biochemistry point of view, DNA- 
breathing is of interest, as it is implicated to influence the 
binding of binding proteins, enzymes, and other chemi- 
cals to DNA. Thus, the relatively fast bubble dynamics 
with respect to the binding rates of proteins, that selec- 
tively bind to single-stranded DNA, provides a kinetic 
block preven ting DNA denaturation through these pro- 
teins [3, 0, Q| . This was investigated in detail on the 
bases of a homopolymer approach in Refs. 0,0]. Simi- 
larly, the increase of the bubble formation probability as 
well as of the bubble lifetime at certain places along the 
sequence is believed to facilitate the initiation of tran- 
scription by RNA polymerase. This latter point is stud- 
ied in depth in Ref. [13 ■ 

In this paper we investigate a (2 + Invariable master 
equation, that governs the time evolution of the proba- 
bility distribution P(m, XL,t) to find a bubble consisting 
of m broken bps with left fork position xl along the se- 
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FIG. 1: Clamped DNA domain with internal bps x = 1 to M, 
statistical weights Uhb(x), u s t(x), and tag position xt- The 
DNA sequence enters through the statistical weights u 8 t(ai) 
and Uhb{x) for disrupting stacking and hydrogen bonds, re- 
spectively. The bubble breathing process consists of the ini- 
tiation of a bubble and the subsequent motion of the forks at 
positions xl and xr, see also Fig. [^] 

quence. With this approach, that is, an arbitrary DNA 
sequence can be analyzed, and its breathing behavior pre- 
dicted. We discuss the exact form of the transfer matrix 
containing the rate coefficients for all permitted moves, 
and derive the bubble autocorrelation function with as- 
sociated relaxation time spectrum. To be able to con- 
nect to the time series obtained from the complementary 
stochastic simulation, we derive the probability densi- 
ties according to which individual bubble lifetimes and 
the time intervals between successive bubble events are 
distributed. Finally, we show that in the homopolymcr 
limit, analytical results can be obtained. 



II. ONE BUBBLE PARTITION FUNCTION 
AND TRANSFER COEFFICIENTS 

Below the melting temperature T m , a single bubble 
can be considered to be statistically independent due to 
the high nuclcation barrier for initiating a bubble quan- 
tified by cto < 1 [Tt| . such that opening and merging 
of multiple bubbles are rare, and a one-bubble picture 
is appropriate. In the particular case of the bubble con- 
structs used in the fluorescence correlation experiments 
of Ref. 01 > tne sequence is designed such that there is 
a single bubble domain. Referring to these constructs, 
we consider a segment of double-stranded DNA with M 
internal bps. These bps are clamped at both ends such 
that the bps x = and x = M + 1 are always closed 
(Fig. [TJ. The sequence of bps determines the Boltzmann 
weights Uhb{x) = cxp{eh.b{x) /(ksT)} for Watson-Crick 
hydrogen bonding at position x, and the Boltzmann fac- 
tor u s t(x) = exp{e s t(x) / (ksT)} for pure bp-bp stacking 
between bps x — 1 and x, respectively. In the bubble do- 
main, the left and right zipper fork positions xl and xr 
denoting the right- and leftmost closed bp of the bubble 
are stochastic quantities, whose random motion underlies 
the bubble dynamics. 



Instead of using the fork positions xl and xr, we prefer 
to work with the left fork position xl and the bubble 
size m = Xr — xl — 1. For these variables, the partition 
function of the bubble becomes 

x L +m Xh+m+1 

^{x Ll m) = ——- Y[ Uhb(x) Y[ u st (x) (1) 

X=X L +1 X=X L +1 

for to > 1. At m = 0, we define 3f(m = 0) = 1. In re- 
lation Q), instead of the usual cooperativity parameter 
do we use the factor £' = 2 C £ related to the ring factor 
£ w 10~ 3 introduced in Ref. @. For a homopolymer, 
this £ is related to a by a = £ exp{e s t/(ksT)} The 
denominator in Eq. Q represents the entropy loss on 
formation of a closed polymer loop, where the offset by 
one accounts for finite size effects |(| ^| . The associated 
critical exponent is c = 1.76 [l^. For a given bubble 
size, the partition counts m contributions from bro- 
ken hydrogen bonds and m + 1 from disrupted stacking 
interactions. The partition defines the equilibrium 
probability 

P cq (xr m) = 2f(x L ,m) 

^(°) + E TO =iExz,=o 2r(xL,m) 

for finding a bubble of size m at location xl- 

The two variables m and xl are the slow variables of 
the breathing dynamics, while the polymeric degrees of 
freedom of the relatively small bubble enter effectively 
through the partition QJ. We moreover assume that the 
bubble is always close to thermal equilibrium such that 
the partition defines the transition probabilities be- 
tween different states. These conditions allow us to in- 
troduce the master equation © with its transfer matrix 
W below. To introduce the underlying time scales, we 
first define the transfer coefficients. 

The allowed transitions with the associated transfer 
(rate) coefficients are sketched in Fig. [21 The left zipper 
fork is characterized by the rate t J (xl , m) corresponding 
to the process xl — ► xl + 1 of bubble size decrease, and 
t£(xL,Tn) for xl — ► xl — 1 (bubble size increase). Sim- 
ilarly, we introduce t^(x£,m) for xr — > xr + 1 (bubble 
size increase) and t^(xL, m) for xr — > xr — 1 (decrease). 
These rates are valid for transitions between states with 
to > 1. Bubble opening (initiation) m = — > m = 1 is 
quantified by tg(ii), and bubble closing (annihilation) 
m = I — > by tp(xi). Note that by our definitions 
^q{xl) and Xq{xl) actually correspond to bubble open- 
ing/closing at x = Xl + 1 • Clamping requires that xl > 
and Xr < M + 1, corresponding to reflecting boundary 
conditions j2f| 

t~[(xL = 0, m) = t\(x L ,m = M — xl) = 0. (3) 

In Fig. |3 we sketch schematically the allowed transitions 
in the tu-xl plane. 

In order to define the various transfer rates t, we firstly 
impose the detailed balance conditions (compare poU^ ) 




FIG. 2: Possible bubble (un)zipping transitions: for m > 2, 
the four transfer rates t^ R (xL,m) completely determine the 

transitions out of this state; the coefficients tj^(xL =F 1, m± 1) 
and V^(xL,m =p 1) specify the possible jumps into this state. 
Jumps between state m = 1 and ground state m — are 
described by ^(xl) and Xq(xl)- 




x L =0 x L =l - ° x L =M-l 



FIG. 3: Schematic representation of the (xl, m)-lattice on 
which the DNA-bubble jump process takes place, with the 
permitted transitions; compare to Figs.0and|5| The bound- 
ary conditions Eq. © are also indicated. 



t+(x £ - l,m + 1) 
t£(x L> m) 
t R (x L ,m + 1) 
t+{x L ,m) 

t G (x L ) 



P cq (x Ll m) 
P cq (xL -l,m+l) 

P cq (x L ,m) 
P eq (xL,m + 1) 

P cq (x Ll 1) 

ftq(O) ' 



(4a) 
(4b) 
(4c) 



that ensure relaxation toward the equilibrium distribu- 
tion P eq (xL,iri), see Eq. (J2J. The latter can be seen by 
recalling that the equilibrium distribution J2J is based 
on the Boltzmann factors Whb and u s t through the parti- 
tion . The detailed balance conditions do not uniquely 
define the transfer rates t, leaving a certain freedom of 
choice E2- We use the following conventions. 

To define the zipping rate, we assume that it is in- 
dependent of the position x along the DNA sequence. 
The picture behind is that the closure of the bp is dom- 
inated by the requirement that the two bases diffuse in 
real space until mutual encounter and eventual bond for- 
mation. As sterically AT and GC bps are very similar, 
the zipping rate should not significantly vary with the 
individual nature of the involved bps, and we choose the 
constant rate fc/2, see below. This rate k is the only 
adjustable parameter of our model, and has to be inde- 
pendently determined from experiment or more funda- 
mental models. This choice, as mentioned above, is not 
unique. Instead, an ^-dependence of k could easily be 
introduced by choosing different powers of the statistical 



weights entering the rate coefficients such that they still 
fulfill detailed balance. 

Decrease in bubble size due to zipping close of the bp 
closest to cither the left or the right zipper fork is there- 
fore ruled by 



t R (x L ,m)\ m > 2 = 7;, 



(5) 



respectively. The factor 1/2 is introduced for consistency 
with previous approaches [Til ITf3 |. Note that for sim- 
plicity we do not introduce the hook exponent discussed 
in previous studies 

EI El El This exponent should 
be important for large bubbles, when during the zipping 
process not only the bp at the zipper fork is moved, but 
also part of the vicinal single-strand is dragged or pushed 
along El El El- 

Increase in bubble size is controlled by 



t L (x L ,m) = -u st (x L )u hh (x L )s(m), 



(6a) 



t^(x L ,m) = -u st {x R + l)u hh (x R )s(m), (6b) 



for m > 1, where 



s(m) 



(7) 



For to > 1 we thus take the bubble increase rate coef- 
ficients proportional to the first power of the Arrhcnius 
factor Ustithb = cxp{(ehb + £st)/[kB-T]} times the loop 
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correction s(m). We stress that Eqs. i|6a[l and (|6b|) arc 

dictated by the detailed balance condition, once the con- 
vention lO is established. As noted, detailed balance 
would still be fulfilled, for instance, if only a fractional 
power a q of the Arrhenius factor a appeared in the open- 
ing rates if complemented by the respective power a l ~ q 
in the closing rates. 
Finally, we define 

t+(x L ) = kS,'s(0)u st (x L + l)u hh (x L + l) (8a) 

*u st (x L + 2) 
ta(x L ) = k (8b) 

for bubble initiation and annihilation from and to the 
zero-bubble state to = 0, with the bubble initiation factor 
£' in the expression for t<t,. As bubble initiation involves 
breaking of two stacking interactions at consecutive bps, 
we have the factor u s t[xL + l)Ust(^i + 2) in expression 
(|8ap . The last open bp can zip close from either side, 
so the bubble closing rate X.q(xl) makes up twice the 
zipping rate of a single fork. 

The rates t together with the boundary conditions fully 
determine the bubble dynamics. In the next Section, 
wc establish the master equation for the time evolution 
of the distribution P(m, xl, t) and derive the associated 
quantities. 

III. MASTER EQUATION FOR 
DNA-BREATHING 

The joint probability distribution P(t) = 
P{xl, to, t; x' L , to', 0) measures the likelihood that 
at time t the system is in state {x^, to} and that at t = 
it was in state {x' L ,m'}. Its time evolution is controlled 
by the master equation 

|p(i) = WP{t). (9) 

The explicit form of the transfer matrix W is discussed 
in detail in Sec. IIVI Here, we concentrate on how to 
derive the quantities relevant for the description of DNA- 
breathing experiments. In that course we introduce the 
eigenmode ansatz [53, 

P(t) = Y / c P Q P exp(- Vp t), (10) 

p 

where the coefficients c p are fixed by the initial condition. 
Combining Eqs. (|l(Jfl and ©, the eigenvalue equation 

y^Q P = -n P Qp (ii) 

yields, compare Ref. and below for more details. The 
eigenvalues r/ p and eigenvectors Q p allow one to com- 
pute any quantity of interest. In fact, the autocorrela- 
tion function for bubble breathing and the corresponding 
relaxation time distribution are quite straightforward to 



obtain, see section UlI Al Below, in section All Bl we dis- 
cuss the more subtle point how the probability densities 
for the bubble lifetime and the interbubble event waiting 
time can be derived. 



A. Blinking autocorrelation function of a tagged bp 

Motivated by the fluorescence correlation setup in 
Ref. we arc interested in the state of a tagged bp 
at x = xt, see Fig. ^ In the experiment fluorescence 
occurs if the bps in a A-neighborhood of the fluorophore 
position xt are open Measured fluorescence blink- 
ing time series thus correspond to the stochastic variable 
I(t), defined by 

t! , _ J 1 at least all bps in [xt — A, xt + A] open 
' | otherwise, 

(12) 

The stochastic variable is 7 = 1 if the system is in the 
phase space region defined by 

Rl : {0 ^ x L < xt-A-1, x T -x L + A ^ to ^ Al-x L }. 

(13) 

Conversely, 7 = corresponds to the complement R0. 

The equilibrium autocorrelation function of fluores- 
cence blinking is defined by 

A(x T ,t) = (I(t)I(0))-((I)) 2 : (14) 

where the angles (•} denote an ensemble average over 
the equilibrium distribution P oq . A(xT,t) quantifies the 
relaxation dynamics of the tagged bp. This is evident 
from the identity 

l l 

<i(t)j(o)} = E w *; J, < °) 7 ' = pi 1 ' *; x > °)> ( 15 ) 

1=0 I'=0 

where p(l,t; 1,0) is the survival probability that I(t) = 
1 and that 1(0) = 1. From the definition of the two 
regions Rl and R0 it follows that p(l, t; 1, 0) yields from 
summation of P{xl, m, t; x' L , m', 0) solely over region Rl: 

p(l,i;l,0)= Y, P(x L ,m,t;x' L ,m',0). (16) 

Together with Eq. (|15fl . combined with the eigenmode 
decomposition l|10|l . and under the assumption that ini- 
tially the system is at equilibrium, we obtain 

P(x L ,TO,0;a;' L ,77i',0) = 6 mm >d XLX > L P cq (x L , m), (17) 

such that we can rewrite the autocorrelation function 
11141 in the form 

A(x T ,t) = J2 [T P (x T )} 2 eM-t/T P )- (18) 

p#0 
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Here, we use the relaxation times t p — l/rj p , and abbre- 
viate 

XT-A— 1 M-XL 

T p (x T ) = ^2 ^2 Q P (xL,m). (19) 

xl— rn—XT—XL+A 

In all illustrations, we plot the normalized form 
of the autocorrelation function, A(xt, t)/A(xT, 0) = 

A(x T ,t)/T,[T P (x T )f. 

The autocorrelation function A(xx,t) can be rewrit- 
ten as the integral A{xr 1 t) = J dr exp(— i/r)/(xr, t) 
defining the weighted spectral density (relaxation time 
spectrum) density 

f{x T ,T) = Y}T p {x T )] 2 5{T-r p ). (20) 

This quantity indicates how many different exponential 
modes contribute to the autocorrelation function. If 
f(xT, t) is very narrow, the process is approximately ex- 
ponential, whereas a broad relaxation time spectrum in- 
dicates that many different modes play together. 

B. Survival and waiting time densities of a tagged 

bp 



with the coefficients 

_ = %\L XL , m Qp( x ^ m )? , 22 * 

~ P Ep%E* z ,, m ( 2p(zz,,Tn)] 2 ' 

The sums over m,XL arc restricted to regions Rl and R0 
for the survival and waiting time densities, respectively. 
Both survival and waiting time probability densities are 
normalized, J tp(t)dt = 1 and J <p(t)dt = 1, since c p = 
1. 

We point out that a non-trivial problem connected to 
obtaining the appropriate expressions for ip(t) and 4>(t) 
is how to choose the right initial distribution of states 
(there are many states corresponding to a bubble being 
just open/closed). We chose an initial distribution de- 
termined by the distribution of stationary flux into the 
regions Rl and R0. This choice guarantees that (for long 
times) the ratio of the time spent in the 1 = 1 state versus 
the time spent in the 1 = state is given by the equi- 
librium results as required by ergodicity, see Sec. II VI for 
details. In appendix IaI we briefly discuss how stochastic 
modeling can be used to obtain single bubble time se- 
ries, from which all quantities such as fluorescence blink- 
ing autocorrelation function, as well as the survival and 
waiting time densities can be distilled. Both approaches 
converge nicely [Hill. 



The autocorrelation function A{xt, t) is an equilibrium 
average measure for a single bubble. It does not contain 
any information on how the lifetime of individual bub- 
bles is distributed, nor on the waiting time that elapsed 
between annihilation of a bubble and the initiation of the 
next. This information is provided by the survival time 
and waiting time densities <p(t) and ip(t). Here we derive 
these two quantities. 

Survival time and the waiting time densities corre- 
spond to a first passage problem, to respectively start 
from an initial state with 1(0) = 1 or 1(0) = 0, and 
transit to a state I(t) = or I(t) = 1 after time t. To 
obtain these quantities from the master equation frame- 
work, one needs to solve the reduced eigenvalue problem 

m 

y^Q P = -vpQp (21) 

for coordinates belonging to Rl and R0. Details are col- 
lected in Sec. IIVI The reduced eigenvalue ansatze l|2T|) 
for Rl and R0 possess only positive eigenvalues, fj p > 
for all p. This reflects the fact that there exist transi- 
tions from one region to the other, such that probability 
" leaks out" . In terms of the reduced eigenvalues fj p and 
eigenvectors Q p the survival and waiting time densities 
become 

ip(t) = 2J VpC p e^p(-fj p t) (22a) 

p6R0 

4>(t) = ^2 VpC p exp(-fj p t) (22b) 

pGRl 



IV. MASTER EQUATION - THE DETAILS 

In this Section we show the explicit form for the master 
equation with its transfer matrix W and go into details of 
how to solve it numerically. We also develop a formalism 
to derive the waiting and survival densities ip(t) and 4>(t). 

A. The ^-matrix 

In order to present an explicit expression for the 
^-matrix in Eqs. © and I jlljl it is convenient to re- 
place the two-dimensional grid points (xl , m) by a one- 
dimensional coordinate s counting all lattice points, com- 
pare We choose the enumeration illustrated in fig- 
ure 0] From this figure we notice that m G [0, M] and 
xl G [0, M — m]. We label the ground state m = by 
s = 0. For to > 1 an arbitrary s-point can be obtained 
from a specific (xl,to) according to: 

S = S \-=( m -l)M- ^-^ m - 2 K x L + l (23) 

From this relation we notice that the maximum s value 
is 

s=mM{s} = iffli±i), (24) 

i.e., the size of the relevant ^-matrix (see below) scales 
as M 2 . Expression l|23|l allows us to change the trans- 
fer coefficients to the s- variable, t^, R (xL, m) — > t^~ , R (s), 
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ground state: 



s=M+l 



s=M+2 



m=l [J 



X,=0 



I s=2M-l ~| 

| s=M-l 1 | s=M 1 
x L =M-2 x L =M-l 



FIG. 4: Enumeration scheme for the numerical analysis: The 
two-dimensional grid points (xL,m) are replaced by a one- 
dimensional running variable s. See text for details. 



using the explicit expressions JSJ, (|6a|) . and (|6b|) for the 
transfer coefficients, together with the boundary condi- 
tions in Eqs. ©. Also t^rc^m) — > tg(s), following 
Eqs. (|Hajl and (|8bj) . From Eq. and figure 0] we notice 
that 



m+l 

x L -i 

m—1 

IL + 1 
,|OT-1 

.|m+l 



= s|™ + M - m, for .t l > 1 & to < M - 1, 



(M 
(M 



to 
to 



a\™+M-m- 



h 1), for to > 2, 
h 2), for to > 2, 
1. 



for x L <M-(m+l)km<M - 1. 



We can then write Eq. (jll|) explicitly as 

J2^( s , s ')Qp( s ') = - r )rQr(s), (26) 



where the matrix-elements arc 



(s, s + M — m) = 



t£(s + M - to), 

for sftlXL > 1 & to > I 



(s,s- [M-m+ 1]) = t L (s-[M-' 



1]), 



(s,s- [M-m + 2]) = 



(s,s + M -m + l) = 



for s rh m > 2 

t+(s- [M-m + 2]), 

for s rh m > 2, 

t fl (s + M-m + 1), 

for s rh a;/, < M - (m + 1) 

& 1 < to < M - 1, 

-(t£00 + tz(«) 
+*«(«) +*£(«))> 

for s rh m > 2. (27) 



We have introduced the notation s rh with the meaning "s 
is to be taken for" . The positive terms above correspond 
to jumps to the state {xl,to}, while the negative terms 
correspond to jumps from the state {xl, rn}, see Figs. [2] 
and 13 The probability for a bubble of size to = 1 is 
altered by exchange with the to = 2 state, or the to = 



W(0,x L + l) = t+(x L ),for s rh to = 1, 

= -(ta(x L )+tl(s)+t+(s)), 

for s rh to = 1. (28) 

Finally, for the ground state population, we find 



W(0,x L + 1) = t^(x L ), for x L < M - 1 



M-l 



3^(0,0) 



£4( 



(29) 



i.e., the to = state can change by jumping to this state 
from to = 1 (first term) or by jumping out of the m = 
state (second term). There are M possible jumps out 
from or to the ground state, corresponding to bubble 
opening or closing at any of the M internal bps. The 
remaining matrix elements are equal to zero. The prob- 
lem at hand is that of determining the eigenvalues and 
eigenvectors of the (S + 1) x (S + l)-matrix W above. 
In terms of the running variable s, see Eq. (|23[1 . and the 
W- matrix defined in Eqs. J27J), igEJ and ^ the detailed 
balance conditions can be written as 



W(s, 



)P eq (s') = W(s',s)P eq (s). 



(25) The eigenvectors are orthonormal in the sense |2l 



Q p(s)Q P '(s) 

peq( s ) 



= 5. 



P,P' ■ 



(30) 



(31) 



Convenient checks of the numerical results then include: 
(i) there should be one zero eigenvalue 770 = 0, the corre- 
sponding eigenvector is the equilibrium distribution, i.e. 
Qo{s) = P eq (s); (ii) the remaining eigenvalues should 
be real and negative (so that r] v > for p > 1); (iii) 
The eigenvectors should satisfy the orthonormality rela- 
tion, Eq. I|31|) . Instead of working with the asymmet- 
ric matrix W(s,s'), for numerical purposes it is some- 
times preferable to use the symmetric matrix ^(s, s') = 
jr(s)- 1 / 2 ^(s,s')-^(s') 1/2 ; see Refs. 00 for details. 
Indeed, the Matlab code we used to numerically solve the 
master equation, is based on the ^-matrix. 



B. Survival and waiting time densities 

In this section we derive the expression for the waiting 
and survival time densities given in Eqs. 

Denote by p(t\si n [ t ) the first passage time density start- 
ing from some initial position Sj n i t £ Rl' or R0', see Fig. 
The survival time density <j>(t) and waiting time den- 
sity tp(t) are then given by J2 Sinit p( t l s init)/(si„it), where 
/(■Sinit) is the distribution of initial points G Rl' (for 
4>(t)) or MO' (for ip(t)). Following standard arguments 
(see, e.g, |2(|) we can write the expression for p(i|si n it), 



7 



m=5 



m=4 



m=3 



m=2 



m=l 



m=0 



M=5 
x T =3 



- Rl' 

- RO' 



RO 



x L =0 x L =l x L =2 x L =3 x L =4 



FIG. 5: Schematic of the (xl , m)-points, region Rl (RO), for 
which the stochastic variable takes the value I — 1 (J = 0). 
The boundary points regions Rl' and RO' are also indicated. 
The illustration is for the case M = 5 and xt = 3, with 
A = 0. 



and therefore ip(t) and 4>(t), which becomes: 

i>(t) = E fjpCpexp(-fjpt), (32a) 



where 



(f){t) = ^2 V P CpCxp(~T] p t), (32b) 
peRi 



^=E Qp p ini ; )/( t it) E^w- ^ 

si„it ^eq (Sinit J 

Here, s G R0(R1) for ip(t) (<^(t)), and fj p and Q p (s) are 
determined through the eigenvalue equation ij2*T|) . which 
explicitly becomes in s-space [5^ 



E^(s,s)Q p (s) = -T] p Q p (s) 



(34) 



where s, s G Rl when calculating the survival time den- 
sity, and s,s£ MO for the waiting time density. 

The problem is now reduced to obtaining the distri- 
bution of initial points /(sinit) such that agreement with 
the Gillespie time scries (see Appendix lA")l is obtained for 
long times. We define the rate coefficients for jumps from 
the points in the boundary region Rl' to RO' (see Figs.|3] 
and[SJ): ii-^o(simt) = t^, t^, or t^ 7 where s ini t G Rl'. 
Similarly, for jumps from the points in the boundary re- 
gion RO' to Rl' (s init G RO') we define: to^i(simt) = *£, 
tJi, or Iq. From the detailed balance condition we have 
that 



ti_ >0 (s)P eq (s) = t -i(a')^eq(a')» 



(35) 



where s and s' are points in region Rl' and RO' which 
are connected. For the the survival time density we then 
choose the distribution of initial points proportional to 



the stationary influx from region RO'. Furthermore using 
the detailed balance condition and normalizing we have 
for the initial distribution in the 1=1 state 



/(■Sinit) 



tl^o( s init)-Pcq(sinit) 



52 sinit tl-»o(sinit)-Peq(Sinit) 

Similarly for the initial distribution in the 1=0 state: 

to^l( s mit)^cq(sinit) 



(36) 



/(Sinit) 



S Sill i t ^0-»l( s init)-feq( s init) 



(37) 



which, together with Eqs. l|52ajl . (|32b|) and deter- 
mines ijj(t) and <p(t). We below proceed to show the 
choices above for /(s; n it) satisfy ergodicity requirements. 

Ergodicity requires that the ratio of times spent in the 
1=1 and 1=0 state equals 



^eq — 



Ss6R0 ^eq( S ) 



(38) 



From Eq. I|32b() we have that the mean survival time can 
be written according to: 



/>oo 

/ tcj>(t)dt = Vfer 1 ^ 

Jo 



(39) 



and identically for the moan waiting time r wa ; t . We pro- 
ceed by noticing that the eigenvalue equation (|34|l can be 
written as 

E (^ rofl ( s < s ~) - ^ abs ( s > *)) Qp(«) = -%Qp( s )> (4°) 

8 

where 

#- abs ( S ,s)=t 1 ^ ( s )<5 SjS( 5 s , Sinit , (41) 

with s ini t G Ml', and W Te& (s,s) satisfy E s >^ refl (M) = 
0. Summing Eq. I|34|) over s and using the above identity 
we obtain 

Q p {s) = % 1 tl^o(simt)Q P (Smit) (42) 

8 Si n it 

which is a useful connection between quantities in the 
bulk (s G Rl) and at the boundary Sinit G Rl'. Apply- 
ing this relation to the expressions for the survival time, 
Eqs. (EHJI, (dJ), and (dJ), we find 



E p EsE s Qp(s)Q p (s) 

X)si„it tl^o(Smit)-Foq(Sinit) 

Finally, using the completeness relation [22T | 
lp{s)Q p (s) 



E 



^eq(5) 



we see that 



XsSKl ^eq(s) 



S Si nit t l^o(s)Peq(sinit)' 



(43) 



(44) 



(45) 



8 




density of waiting times iP(t) spent in the 1 = state, 
whose characteristic time scale r' = J Q (LttiI){t) is more 
than an order of magnitude longer than at xt = 41. In 
contrast, we observe similar behavior for the density of 
opening times 4>(r) for xt = 38 and 41. The solid lines 
are the results from the master equation, see subsection 
IIII Bl showing excellent agreement with the Gillespie re- 
sults. Notice that whereas ifj(t) ' 1S characterized by a 
single exponential, <p(t) show a crossover between differ- 
ent regimes. For long times both ip(r) and </>(r) decay 
exponentially as it should for a finite DNA stretch. 



V. REDUCED ONE- VARIABLE SCHEME FOR 
A HOMOPOLYMER 



FIG. 6: Top: Fluorescence time series I(t) for the T7 pro- 
moter sequence, with tag position xt = 38 (red) and xt = 41 
(blue). Bottom: Waiting time (^(t)) and fluorescence sur- 
vival time (</>(t)) densities, in units of k. The data points 
(solid lines) are results from the Gillespie algorithm (master 
equation). All results are for T = 37°C and 100 mM NaCl 
with DNA parameters from ||. 



In a similar fashion 



7"wait 



Es^to^l^-Peq^init)' 



(46) 



which, using the detailed balance condition, Eq. (|35[l . 
shows that T surv /r wait = R cq . 

With the completeness relation (|4*4"|) and Eq. l|4*2|> we 
find from Eqs. I|33|) . 136f) . and 137fl that c p can be written 



(47) 



which is the form given in Eq. I|22c[) . 

We show in Fig. HJ the time series obtained from a 
stochastic simulation (see Ap pendix lAl for a short intro- 
duction, and refer to Ref. J23] for details) for two different 
tag positions in the T7 bacteriovirus promoter sequence 



1 20 
I 

5 ; -aTGACCAGTTGAAGGACTGGAAGTAATACGACTC 

AGTATAGGGAC AATGCTTAAGGTCGCTCTCTAGGAg- 3 ' 

38 41 68 



(48) 



After addressing the derivation of the probability den- 
sities ip(t) and 4>{t), and the details concerning the trans- 
fer matrix W, we show how the master equation for- 
malism reduces when a homopolymer sequence is con- 
sidered, that is, a sequence with only one type of bps 
such as (j4T)tv- Homopolymcrs can be realized exper- 
imentally In the case they are clamped, possible sec- 
ondary structure formation does not appear to occur, 
and our formalism remains valid. In the case of long ho- 
mopolymers, imperfect matching conditions apply, and 
additional degrees of freedom emerge Q- Although this 
can be straightforwardly included in the formalism, we 
do not consider this case here. 

In the homopolymer case, it is possible to obtain ana- 
lytical results. To that end we note that for a homopoly- 
mer, all bps have the same statistical weights u s t(x) and 
Uhb(x)- Formally, we therefore use u = Wst^hb f° r d-is- 
ruption of additional bps after bubble initiation. Due 
to this choice, we need to utilize the initiation factor 
<7o instead of the ring factor £, as a takes care of the 
fact that upon initiation two stacking bonds are broken 
El IE HH ■ If we furthermore assume that we are below 
the melting temperature u < 1, have a long DNA region 
M ^> 1 and consider bubbles far from the clamping, end 
effects are much less pronounced. It then follows that 
P(xL,Tn,t;x' L ,m') = P(m,t;m'), and the master equa- 
tion reduces to 



§- t P(m,t) 



t+(m - l)P(m - l,t) 

+r(m + l)P(m+l,t) 

-(t + (m) +t~(m))P(m,t), (49) 



whose TATA motif is marked in red |28j . A promoter 
is a sequence (often containing the so called TATA mo- 
tif) marking the start of a gene, to which RNA poly- 
merase is recruited and where transcription then initi- 
ates. Fig. shows the signal I(t) at 37°C for the tag 
positions xt = 38 in the core of TATA, and xt = 41 at 
the second GC bp after TATA. Bubble events occur much 
more frequently in TATA (the TA/AT stacking interac- 
tion is particularly weak |8(). This is quantified by the 



with the short-hand notation P(m, t) = P(m, t; m'). The 
forward transfer coefficients in this limit are given by 



i + (m = 0) = ka Q us(0) 
i + (m)\ m >i = kus(m), 



(50) 



where we have incorporated the fact that a bubble size 
increase can occur by opening of a bp at either the left 
or the right fork. For the backward transfer coefficients, 
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wc find 



(51) 



The eigenvalue equation corresponding to Eq. I|49|) has 
the comparatively simple structure 



t+(m-l)Q p (m-l) 
+t"(m + l)Q p (m + 1) 
-(t + (m)+t-(m))Qp(m) 



(52) 



with eigenvalues ry p and eigenvectors Q p (m) (p — 
0, 1, . . . , M). The equation above is identical to the one 
in |16l l24j . and thus our generalized formalism is con- 
sistent with previous homopolymer models [16L 12 1| . We 
note that the equilibrium distribution becomes 



P eq (m) 



3T{m) 



(53) 



where ^(m) = a (l + m)- c u m with 2f(0) = 1, see 
Eqs. © and ©. 

The autocorrelation function is, as before, simply pro- 
portional to the joint probability of having m > 1 at 
time t and to > 1 at initial time t = 0. Proceeding as 
previously, and assuming that initially the system is at 
equilibrium, P(to,0;to',0) = 5 mm i P eq (m), we have 



A(t) = (/(t)/(0)> - «/» : 



T, 



exp 



t 



where T v 



Em=i Q P {m) 



(54) 

Here, we introduced the re- 



laxation times f p = 1/fjp. As before, we write the corre- 
lation function according to A(t) = J dr exp(— f/r)/(r), 
with the relaxation time spectrum 



5{t 



(55) 



In Fig. [7J we compare the approximate result for 
A(xt, t) obtained by numerical solution of Eqs. (|52|l . and 
using Eq. I|54|) . with the general result from the master 
equation in Section II I II We also show the correspond- 
ing weighted spectral densities given by Eq. J32J). We 
note that the approximate expression works well only for 
the case of internal tagging and temperatures below the 
melting temperature (and for a sufficiently long DNA re- 
gion); for a short DNA sequence, close-to-end-tagging or 
high temperatures (i.e., large bubbles) end effects, which 
are not included in the approximate model above, are 
significant. 

In the analysis of Refs. 0,0] it was found that close 
to the melting transition at T m , the mean correlation 
function takes its maximum (critical slowing down). In 
order to get an understanding of this behavior we here 
analytically obtain the largest relaxation time from the 
homopolymer model above. From Reference |l5j| we have 
that the eigenvalues, see Eq. I|52|) . are for c = 



'//< 



k(u + 1 - 2m 1 / 2 cos u p ) 



where u p (0 < lu p < n) is obtained from the transcen- 
dental equation 

g(u p ) = sin[(M + l)u p ] - 6sin[Mu) p ] = (57) 

with S = (1 — (Jo)it 1 / 2 . For u — > 1 and (To — > we get 

g(u p ) = sin[(M + l)u p ] — sin[Mo> p ] 

W v 1 , 



so that we have 



in cos[(M 



(P - 1/2)tt 



)Ul 



M + l/2 



(58) 



(59) 



(56) 



which together with Eq. Ij56(l give the eigenvalues. The 
smallest eigenvalue (largest relaxation time) is obtained 
forp = 1, i.e. jft = 2A(l-cos(7r/[2M+l])) « kn 2 /(2M+ 
l) 2 for M 3> 1, and therefore the largest relaxation time 
becomes 

fl 4^r' (60) 

We notice that the longest relaxation time scales as ~ M 2 
at melting, in agreement with the findings in [3(J ■ Fig- El 
demonstrates the good agreement of the homopolymer 
result (r max , ID in the figure) with the maxima of the 
correlation time, that coincide with the melting concen- 
tration. 



VI. CONCLUSIONS 

In this study we considered the bubble breathing dy- 
namics in a hetcropolymcr DNA-rcgion characterized by 
statistical weights u st (x) for disrupting a stacking inter- 
action between neighboring bps, and the weight Uhh(x) 
for breaking a Watson-Crick hydrogen bond (a; labels dif- 
ferent bps), as well the bubble initiation parameter (the 
ring-factor) oo (£). For that purpose, we introduced a 
(2 + Invariable master equation governing the time evo- 
lution of the probability distribution to find a bubble 
of size to with left fork position ii at time t, as well 
as a complementary Gillespie scheme. The time aver- 
ages from the stochastic simulation agree well with the 
ensemble properties derived from the ME. We calculate 
the spectrum of relaxation times, and in particular the 
experimentally measurable autocorrelation function of a 
tagged bp is obtained. All parameters in our model are 
known from recent equilibrium measurements available 
for a wide range of temperatures and NaCl concentra- 
tions, except for the rate constant k for (un)zipping that 
is the only free fit parameter. A better understanding of 
the zipping rate k remains an open question, requiring a 
detailed microscopic modelling of DNA-breathing. 

For the case of a long homopolymer DNA region with 
internal tagging and below the melting temperature the 
position of the bubble becomes negligible, and the master 
equation reduces to previous (l+l)-variable approaches 
in terms of the bubble size. 
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x (in units of k 1 ) x (in units of k 1 ) x (in units of k 1 ) 



FIG. 7: Autocorrelation A(t) and spectral density /(r) for a tagged bp in a homopolymer region: u = Uhbitst- Left: Close-to- 
end-tagging far from T m (xt = 2, u = 0.6). Middle: Center-tagging far from T m (xt = 20, it = 0.6). Right: Center-tagging 
close to Tm (xt = 20, u = 0.9). In the A(t) plots the blue curves are the exact result. The dashed green curves are 
approximated from from Eqs. 11521 and 1541 . In the spectral density plot the data were collected into 10 bins. The green bars 
are the approximate one- variable results, Eq. 1551 . and the blue bars are the exact result. The length of the DNA segment was 
M — 40. The approximate expression only works well for internal tagging and below T m . 



Smelt (AT) A 
r C melt( Q C) 

Wg 

w 2D - 


/ 7\ A 


















r """I i i 


-'"v 

i • ' ' 


1 '"' 




""1 ' """l I 










T=30°C - 

40°C 

50°C - 

60°C 

70°C - - 
80°C 





















io" 7 icr 6 icr 5 icr 4 io" 3 icy 2 icr 1 io° io 1 10 2 



C(NaCI) [M] 



FIG. 8: Top: Mean correlation time r C orr = / °° dt' A(t')/A(0) 
versus NaCl concentration for various temperatures T for 
the AT9 construct of Ref. [Till , showing a critical slowing 
down at the melting concentration (compare lower panel). 
The triangles denotes the melting concentration for infinitely 
long random AT and GC stretches. The curve denoted by 
Tmaxic is the result given in Eq. (1511 . and r max 2_D = max{r p }, 
p = 1, . . . , S is the maximum relaxation time of the full prob- 
lem specified in Sec. III. Bottom: Opening probability of bp 
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APPENDIX A: GILLESPIE APPROACH 

In this section we briefly review the Gillespie algo- 
rithm. Together with the explicit expressions for the 
transfer cocfficicirts introduced in the previous section 
it is used to generate stochastic time series of bubble 
breathing. In particular we show how the motion of a 
tagged bp is obtained. 

To denote a bubble state of m broken bps at position 
xl we define the occupation number 6(xt,m) for each 
lattice point in Fig. |3| with the properties b(xL, m) = 1 if 
the particular state {ii, to} is occupied and b(xL, to) = 
for unoccupied states. For the completely zipped state 
to = there is no dependence on xl, and we intro- 
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duce the occupation number 6(0). The stochastic DNA 
breathing then corresponds to the nearest neighbor jump 
processes in the triangular lattice in Fig. Each jump 
away from the state {:el,to} (i.e., from the state with 
6(xl, to) = 1) occurs at a random time r and in a random 
direction to one of the nearest neighbors; it is gove rned 
by the reaction probability density function [3ll l32| 

P(t,h,v) =t^(x L ,m)exp ^-r^t£(a; L ,m)^ , (Al) 

which for a given state (xl , m) defines after what waiting 
time r the next step occurs and in what 'direction', v G 
{G,L,i?}, /i G {+/~ }• A simulation run produces a 
time series of occupied states {xl, to} and how long time 
T = T j (i = 1 , - ■ • , -^V, where N is the number of steps 
in the simulation) this particular state is occupied. This 
waiting time r, that is, according to Eq. (|A1|) follows a 
Poisson distribution 



the 1 = state, as well as the survival time distribution 
4>{t) of times in the 1=1 state. Explicit examples for 
ip(T) and 4>{r) are shown in Sec. II VI 

The probability that the tagged bp is open becomes 



A' 



(A3) 



where tj = X)j'=i r j'- For long times the explicit con- 
struction of the Gillespie scheme together with the de- 
tailed balance conditions guarantee that Pa(tj) tends 
to the equilibrium probability, i.e. that Pc(tj ~ * 
°°) = E a;i .m G Ri pCq ( a; i' TO ) ; where P c i(x L ,m) is given 
in Eq. ©. 



2. Tagged base-pair autocorrelation function 



1. Tagged bp survival and waiting time densities 

The stochastic variable I(t) is then obtained by sum- 
ming the Gillespie occupation number b(xL,m) (b(xL, to) 
takes only values or 1) over region Ml, i.e. 

I(t)= ]T b(x L ,m). (A2) 

H.mell 

From the time series for I(t) one can, for instance, calcu- 
late the waiting time distribution "0( T ) of times spent in 



The autocorrelation function for a tagged bp is ob- 
tained through 

A t (x T ,t) = W)I{0)-(W)) 2 

= -J o I(t + t')I(t')dt'-(^J o I(t')dt'\ (A4) 

which for long times agrees with the the ensemble av- 
erage, Eq. 1)14(1 . from the master equation. The func- 
tion At(xT, t) corresponds to the blinking autocorrelation 
function obtained in the FCS experiment from Ref. [Tlj . 
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